Deep Recurrent Neural Network Based Monaural Speech Separation Using Recurrent Temporal Restricted Boltzmann Machines

نویسندگان

  • Suman Samui
  • Indrajit Chakrabarti
  • Soumya K. Ghosh
چکیده

This paper presents a single-channel speech separation method implemented with a deep recurrent neural network (DRNN) using recurrent temporal restricted Boltzmann machines (RTRBM). Although deep neural network (DNN) based speech separation (denoising task) methods perform quite well compared to the conventional statistical model based speech enhancement techniques, in DNN-based methods, the temporal correlations across speech frames are often ignored, resulting in loss of spectral detail in the reconstructed output speech. In order to alleviate this issue, one RTRBM is employed for modelling the acoustic features of input (mixture) signal and two RTRBMs are trained for the two training targets (source signals). Each RTRBM attempts to model the abstractions present in the training data at each time step as well as the temporal dependencies in the training data. The entire network (consisting of three RTRBMs and one recurrent neural network) can be fine-tuned by the joint optimization of the DRNN with an extra masking layer which enforces a reconstruction constraint. The proposed method has been evaluated on the IEEE corpus and TIMIT dataset for speech denoising task. Experimental results have established that the proposed approach outperforms NMF and conventional DNN and DRNN-based speech enhancement methods.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Hierarchy of Recurrent Networks for Speech Recognition

Generative models for sequential data based on directed graphs of Restricted Boltzmann Machines (RBMs) are able to accurately model high dimensional sequences as recently shown. In these models, temporal dependencies in the input are discovered by either buffering previous visible variables or by recurrent connections of the hidden variables. Here we propose a modification of these models, the ...

متن کامل

Speech Emotion Recognition Using Scalogram Based Deep Structure

Speech Emotion Recognition (SER) is an important part of speech-based Human-Computer Interface (HCI) applications. Previous SER methods rely on the extraction of features and training an appropriate classifier. However, most of those features can be affected by emotionally irrelevant factors such as gender, speaking styles and environment. Here, an SER method has been proposed based on a concat...

متن کامل

Singing-Voice Separation from Monaural Recordings using Deep Recurrent Neural Networks

Monaural source separation is important for many real world applications. It is challenging since only single channel information is available. In this paper, we explore using deep recurrent neural networks for singing voice separation from monaural recordings in a supervised setting. Deep recurrent neural networks with different temporal connections are explored. We propose jointly optimizing ...

متن کامل

Development of a Sound Coding Strategy based on a Deep Recurrent Neural Network for Monaural Source Separation in Cochlear Implants

The aim of this study is to investigate whether a source separation algorithm based on a deep recurrent neural network (DRNN) can provide a speech perception benefit for cochlear implant users when speech signals are mixed with another competing voice. The DRNN is based on an existing architecture that is used in combination with an extra masking layer for optimization. The approach has been ev...

متن کامل

High-order sequence modeling using speaker-dependent recurrent temporal restricted boltzmann machines for voice conversion

This paper presents a voice conversion (VC) method that utilizes recently proposed recurrent temporal restricted Boltzmann machines (RTRBMs) for each speaker, with the goal of capturing high-order temporal dependencies in an acoustic sequence. Our algorithm starts from the separate training of two RTRBMs for a source and target speaker using speaker-dependent training data. Since each RTRBM att...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017